Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box
نویسندگان
چکیده
The Big Australian Speech Corpus project incorporates the strategic goals of 30 Chief Investigators from various speech science areas. Speech from 1000 geographically and socially diverse speakers is being recorded using a uniform and automated protocol plus standardized hardware and software to produce a widely applicable and extensible database – AusTalk. Here we describe the project’s major components and organization; share the lessons learnt from difficulties and challenges; and present the results achieved so far.
منابع مشابه
AusTalk: an audio-visual corpus of Australian English
This paper describes the AusTalk corpus, which was designed and created through the Big ASC, a collaborative project with the two main goals of providing a standardised infrastructure for audio-visual recordings in Australia and of producing a large audio-visual corpus of Australian English, with 3 hours of AV recordings for 1000 speakers. We first present the overall project, then describe the...
متن کاملStrategies Used in the Translation of Interlingual Subtitling
This study was an attempt to identify the interlingual strategies employed to translate English subtitles into Persian and to determine their frequency, as well. Contrary to many countries, subtitling is a new field in Iran. The study, a corpus-based, comparative, descriptive, non-judgmental analysis of an English-Persian parallel corpus, comprised English audio scripts of five movies of differ...
متن کاملThe audio-video australian English speech data corpus AVOZES
This paper presents the Audio-Video Australian English Speech data corpus AVOZES. It contains recordings of 20 speakers uttering a variety of phrases. The corpus was designed for research on the statistical relationship of audio and video speech parameters with an audio-video (AV) automatic speech recognition (ASR) task in mind, but may be useful for other research tasks. AVOZES is the first pu...
متن کاملMetadiscourse Markers in a Corpus of Learner Language: The Case of Iranian EFL Learners
Different issues have been probed in learner corpus research since the late 1980s.However, taking the im- portance of meta discourse markers (MDMs) in signposting academic discourse, their use in Iranian EFL learners‟ academic essays is an area of research in need of a more serious analysis. Contributing to this line of investigation, this paper reports a corpus-based study of the use of MDMs i...
متن کاملA Detailed Description of the AVOZES Data Corpus
The AVOZES data corpus has recently been made publicly available for other interested researchers. It is the first publicly available audio-video speech data corpus for Australian English. It contains recordings from 20 speakers and the sequences provide both a systematic coverage of the phonemes and visemes of Australian English as well as some application-driven utterances. AVOZES is also the...
متن کامل